3D Recursive Gaussian IIR on GPUs and FPGAs A Case Study for Accelerating Bandwidth-Bounded Applications
نویسندگان
چکیده
GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 MADDs per dimension). Since this application is clearly bandwidth bounded, we show that the difference on the memory subsystems on different platform requires different bandwidth optimization techniques. Our implementations on the GPU and FPGA platforms show a 26X and 33X speedup respectively over the optimized single-thread code on the CPU.
منابع مشابه
An Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias
In this paper we introduce a simple, computationally inxepentsive, adaptive recursive structure for enhancing bandpass signals highly corrupted by broad-band noise. This adaptive algorithm, enhancing input signals, enables us to estimate the center frequency and the bandwidth of the input signal. In addition, an important feature of the proposed structure is that the conventional bias existing ...
متن کاملAn Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias
In this paper we introduce a simple, computationally inxepentsive, adaptive recursive structure for enhancing bandpass signals highly corrupted by broad-band noise. This adaptive algorithm, enhancing input signals, enables us to estimate the center frequency and the bandwidth of the input signal. In addition, an important feature of the proposed structure is that the conventional bias existing ...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملA Configurable VHDL Template for Parallelization of 3D Stencil Codes on FPGAs
2D and 3D stencil code applications are very common in scientific computing, but their performance is mostly limited by the memory bandwidth. Elaborate onchip buffering techniques minimize memory transfers, but they cannot be directly realized on fixed general-purpose processors or GPUs. FPGAs instead offer flexibility regarding the processing scheme, the degree of parallelism and the numerical...
متن کاملAre FPGAs Suitable for Edge Computing?
The rapid growth of Internet-of-things (IoT) and artificial intelligence applications have called forth a new computing paradigm–edge computing. In this paper, we study the suitability of deploying FPGAs for edge computing from the perspectives of throughput sensitivity to workload size, architectural adaptiveness to algorithm characteristics, and energy efficiency. This goal is accomplished by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011